Teaching R

A brief introduction

Kumar Ramanathan

@kumarhk

May 20, 2021

Warm up

Let’s do some polling! Go to pollev.com/kumarr436

Learning objectives for today

  1. Construct a lesson plan for an R workshop/session
  2. Understand the examples-and-exercises approach to teaching R
  3. Articulate questions about how to teach R workshops/sessions
  4. Identify resources to address those questions

Outline

  • Preparing to teach
  • Determining the scope of your lesson
  • Building your lesson (nuts-and-bolts)
  • Structuring your lesson (what goes after what?)
  • Examples and Exercises

Preparing to teach

Why should you learn to teach R?

  • Teaching is the best way to learn 🤓
  • Lots of opportunities to teach R workshops/etc at Northwestern 💸
  • Build your teaching portfolio 🗂
  • Training for certain non-academic career paths 👩‍💻

Questions to ask yourself

  • What are your learning objectives for the course/session/workshop?
  • What are the opportunities and constraints you’ll have in your teaching environment?
  • How much background do your students have? How much do they need?

Learning objectives

  • Much of this will be determined by context: the students may need specific skills (e.g. regression, data viz) for a class, you may be teaching general-purpose skills for data analysis, etc.
  • I believe every R workshop should share these learning objectives: Articulate questions about the [technique/method taught] and Identify resources to address those questions.
  • In simpler language: you want students to walk away knowing how to ask for help when they run into problems.

Teaching environment

  • One-off workshop vs. series
  • Virtual vs. in-person
  • Solo teaching vs. group teaching

Determining the scope of your lesson

Background knowledge

  • What do the students already know? You may be able to answer this from program structure, or through a survey.

  • Sometimes students will be coming in with different levels of background knowledge. You can try to address this before the session and/or adjust your teaching accordingly.

  • Where can students get background knowledge before the session?

    • Pre-session videos
    • DataQuest
    • learnr tutorials

Do you need to cover the basics? What are the basics?

  • Installing R and RStudio
  • Navigating RStudio
  • “code”, “comments”, “objects”
  • Syntax and data types
  • Data structures
  • Reading and writing files

Building your lesson

Building your lesson

  • Befriend RMarkdown (and RProjects)
  • Consider how to store and share materials: Github, Box folder, something else, none of the above
  • Other types of tools: RStudio Cloud
  • Connect to other course materials, pre-workshop assignments, etc.

Structuring your lesson

Where to start

Motivation

  • Use the end point as motivation: show them what they will learn!
  • Pick some real data that the students are likely to be interested in

Help students feel comfortable

  • Remind them that they are learning a skill, which only comes with practice
  • Encourage your students to learn from each other

Core content

I usually outline the core content by:

  1. Drafting the end products and breaking down each step.
  2. Looking up existing teaching materials on the same/related topic.

I strongly suggest an examples and exercises approach to teaching skills in R. We’ll practice this in a moment.

Building flexibility into your lesson

  • Plan for a little bit more material than you can teach
  • Provide data files that students can play around with
  • For complex skills, give yourself wiggle room to skip over exercises based on timing/interest

Ending with encouragement

I always like to end with:

  1. Main takeaways
  2. Resources
  3. Reminder that they’ve just learned to create things!

Examples and Exercises

Learning by doing

Learning by doing

Open the RProject file and look in the working directory: you will see an exercises subdirectory and an answers subdirectory.

The following lesson snippets all use .R code files for the exercises. You can also ask students to use .Rmd, especially if this is part of a course where you will need to collect assignment submissions.

Lesson snippets

Lesson snippet 1: ggplot and the grammar of graphics

Components of a basic plot

  • data: a data frame, provided to the ggplot() function
  • geometric objects: the objects/shapes that you want to plot, indicated through one of the many available geom functions, such as geom_point() or geom_hist()
  • aesthetic mapping: the mapping from the data to the geometric objects, provided in an aes() function nested within ggplot() or a geom function
  • connected with the + operator
ggplot(data = <DATA FRAME>) + 
  <GEOM_FUNCTION>(mapping = aes(<VARIABLES>))
ggplot(<DATA FRAME>) + 
  <GEOM_FUNCTION>(aes(<VARIABLES>))

Prepare data

# Load data
gapminder <- gapminder::gapminder

# Look at the structure of the data. You can use glimpse(), summary(), or head().
glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
# Create a new data frame with only the data for 2007
gapminder07 <- filter(gapminder, year==2007)

A basic scatterplot

ggplot(gapminder) + 
    geom_point(aes(x=year, y=pop))

A basic scatterplot

These will produce the same output:

ggplot(gapminder) + 
    geom_point(aes(x=year, y=pop))
ggplot(gapminder, aes(x=year, y=pop)) + 
    geom_point()

Add labels to the plot

ggplot(gapminder) + 
    geom_point(aes(x=year, y=pop)) + 
    labs(title="Population over time", x="Year", y="Population")

Your turn!

Plot life expectancy as a function of GDP per capita for the year 2007, and add labels.

  • Step 1: Supply the data gapminder07 to ggplot()
  • Step 2: Choose geom_point() + Supply x=gdpPercap and y=lifeExp to aes()
  • Step 3: Add title, x, and y in labs()

Your turn!

ggplot(gapminder07) + 
    geom_point(aes(x=gdpPercap, y=lifeExp)) + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy")

Choosing geoms

There are may geom functions we can choose to generate geometric objects:

Let’s try to add geom_smooth() to the previous plot we created.

Smoothed conditional means

ggplot(gapminder07) + 
    geom_point(aes(x=gdpPercap, y=lifeExp)) + 
    geom_smooth(aes(x=gdpPercap, y=lifeExp)) + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy", subtitle="Gapminder 2007 data")
ggplot(gapminder07, aes(x=gdpPercap, y=lifeExp)) + 
    geom_point() + 
    geom_smooth() + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy", subtitle="Gapminder 2007 data")

Smoothed conditional means

ggplot(gapminder07, aes(x=gdpPercap, y=lifeExp)) + 
    geom_point() + 
    geom_smooth() + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy", subtitle="Gapminder 2007 data")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Horizontal line

ggplot(gapminder07) + 
    geom_point(aes(x=gdpPercap, y=lifeExp)) + 
    geom_hline(aes(yintercept=mean(lifeExp))) + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy", subtitle="Gapminder 2007 data")

Your turn!

Plot the life expectancy of each continent in 2007.

Look at the ggplot cheatsheet and decide which kind of geom to use.

Life expectancy in each continent

ggplot(gapminder07) + 
    geom_boxplot(aes(x=continent, y=lifeExp)) + 
    labs(title="Distribution of life expectancy", x="Continent", y="Distribution")

Life expectancy in each continent

ggplot(gapminder07) + 
    geom_col(aes(x=continent, y=lifeExp)) + 
    labs(title="Distribution of life expectancy", x="Continent", y="Distribution")

Grouping variables

You can think of the continent or year variables as grouping variables: they place each observation in one of several groups.

We can represent the groups through aesthetic mapping or facets rather than along one of the axes.

Grouping by color

ggplot(gapminder) + 
    geom_point(aes(x=gdpPercap, y=lifeExp, col=year)) + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy")

Grouping by facet

ggplot(gapminder, aes(x=gdpPercap, y=lifeExp)) + 
    geom_point() + 
    geom_smooth() + 
    facet_wrap(~year) + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Grouping by facet … scales=“free”

Try adding the argument scales="free" to the facet_wrap() layer.

Grouping by facet … scales=“free”

ggplot(gapminder, aes(x=gdpPercap, y=lifeExp)) + 
    geom_point() + 
    geom_smooth() + 
    facet_wrap(~year, scales="free") + 
    labs(title="Do people in richer countries live longer?", x="GDP per capita", y="Life expectancy")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Your turn!

Visualize life expectancy by continent in 2007 again. This time, group continents by color or facet.

Grouping by colors

ggplot(gapminder07, aes(x=gdpPercap, y=lifeExp, 
                        col=continent, size=pop)) + 
    geom_point() + 
    labs(title="Life expectancy as a function of GDP per capita, by continent", x="GDP per capita", y="Life expectancy")

Grouping by facet

ggplot(gapminder07, aes(x=lifeExp)) + 
    geom_density() + 
    facet_wrap(~continent, scales="free") + 
    labs(title="Life expectancy as a function of GDP per capita, by continent", x="GDP per capita", y="Life expectancy")

Cumulative exercise

Choose your own adventure!

Create a plot that includes two geoms and facets

Lesson snippet 2: working with dates

[under construction – answers/exercise2_answers.R shows what will be covered]

Lesson snippet 3: regression and coefplot

[under construction – answers/exercise1_answers.R shows what will be covered]

Wrapping up

Takeaways

  1. Lesson planning is really important: Develop learning objectives. Assess what your students already know + what they need.
  2. Use the tools that R provides: RMarkdown, RProjects, and RStudio itself.
  3. Construct your lessons around examples and exercises, so that students practice writing code during the session.
  4. Lean on the broad community of R users and social scientists. You almost never have to start from scratch.

Resources

Examples from NU:

Other resources:

Take it to the next level (suggestions from Christina Maimone):

You can do it!

Is “Full House” still popular?